An Effective Data Sampling Procedure for Imbalanced Data Learning on Health Insurance Fraud Detection
نویسندگان
چکیده
Fraud detection has received considerable attention from many academic research and industries worldwide due to its increasing popularity. Insurance datasets are enormous, with skewed distributions high dimensionality. Skewed class distribution volume considered significant problems while analyzing insurance datasets, as these issues increase the misclassification rates. Although sampling approaches, such random oversampling SMOTE can help balance data, they also computational complexity lead a deterioration of model's performance. So, more sophisticated techniques needed classes efficiently. This focuses on optimizing learner for fraud by applying Fused Resampling Cleaning Ensemble (FusedRCE) effective in health detection. We hypothesized that meticulous followed guided data cleaning would improve prediction performance learner's understanding minority fraudulent compared other techniques. The proposed model works three steps. As first step, PCA is applied extract necessary features reduce dimensions data. In second hybrid combination k-means clustering used resample imbalanced Oversampling introduces lots noise A thorough performed balanced remove noisy samples generated during using Tomek Link algorithm third step. clears boundary between majority makes precise freer noise. resultant dataset four different classification algorithms: Logistic Regression, Decision Tree Classifier, k-Nearest Neighbors, Neural Networks repeated 5-fold cross-validation. Compared classifiers, FusedRCE had highest average rate 98.9%. results were measured parameters F1 score, Precision, Recall AUC values. obtained show method significantly better than any approach predicting greater accuracy 3x speed training.
منابع مشابه
Outlier-based Health Insurance Fraud Detection for U.S. Medicaid Data
Fraud, waste, and abuse in the U.S. healthcare system are estimated at $700 billion annually. Predictive analytics offers government and private payers the opportunity to identify and prevent or recover such billings. This paper proposes a data-driven method for fraud detection based on comparative research, fraud cases, and literature review. Unsupervised data mining techniques such as outlier...
متن کاملCombining Data Mining and Machine Learning for Effective Fraud Detection
This paper describes the automatic design of methods for detecting fraudulent behavior. Much of the design is accomplished using a series of machine learning methods. In particular, we combine data mining and constructive induction with more standard machine learning techniques to design methods for detecting fraudulent usage of cellular telephones based on profiling customer behavior. Specific...
متن کاملFinancial Reporting Fraud Detection: An Analysis of Data Mining Algorithms
In the last decade, high profile financial frauds committed by large companies in both developed and developing countries were discovered and reported. This study compares the performance of five popular statistical and machine learning models in detecting financial statement fraud. The research objects are companies which experienced both fraudulent and non-fraudulent financial statements betw...
متن کاملthe clustering and classification data mining techniques in insurance fraud detection:the case of iranian car insurance
با توجه به گسترش روز افزون تقلب در حوزه بیمه به خصوص در بخش بیمه اتومبیل و تبعات منفی آن برای شرکت های بیمه، به کارگیری روش های مناسب و کارآمد به منظور شناسایی و کشف تقلب در این حوزه امری ضروری است. درک الگوی موجود در داده های مربوط به مطالبات گزارش شده گذشته می تواند در کشف واقعی یا غیرواقعی بودن ادعای خسارت، مفید باشد. یکی از متداول ترین و پرکاربردترین راه های کشف الگوی داده ها استفاده از ر...
Improved Sampling Techniques for Learning an Imbalanced Data Set
This paper presents the performance of a classifier built using the stackingC algorithm in nine different data sets. Each data set is generated using a sampling technique applied on the original imbalanced data set. Five new sampling techniques are proposed in this paper (i.e., SMOTERandRep, Lax Random Oversampling, Lax Random Undersampling, Combined-Lax Random Oversampling Undersampling, and C...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Computing and Information Technology
سال: 2021
ISSN: ['1846-3908', '1330-1136']
DOI: https://doi.org/10.20532/cit.2020.1005216